智能论文笔记

The Quantum Path Kernel: a Generalized Quantum Neural Tangent Kernel for Deep Quantum Machine Learning

Massimiliano Incudini , Michele Grossi , Antonio Mandarino , Sofia Vallecorsa , Alessandra Di Pierro , David Windridge

分类：机器学习

2022-12-22

Building a quantum analog of classical deep neural networks represents a fundamental challenge in quantum computing. A key issue is how to address the inherent non-linearity of classical deep learning, a problem in the quantum domain due to the fact that the composition of an arbitrary number of quantum gates, consisting of a series of sequential unitary transformations, is intrinsically linear. This problem has been variously approached in the literature, principally via the introduction of measurements between layers of unitary transformations. In this paper, we introduce the Quantum Path Kernel, a formulation of quantum machine learning capable of replicating those aspects of deep machine learning typically associated with superior generalization performance in the classical domain, specifically, hierarchical feature learning. Our approach generalizes the notion of Quantum Neural Tangent Kernel, which has been used to study the dynamics of classical and quantum machine learning models. The Quantum Path Kernel exploits the parameter trajectory, i.e. the curve delineated by model parameters as they evolve during training, enabling the representation of differential layer-wise convergence behaviors, or the formation of hierarchical parametric dependencies, in terms of their manifestation in the gradient space of the predictor function. We evaluate our approach with respect to variants of the classification of Gaussian XOR mixtures - an artificial but emblematic problem that intrinsically requires multilevel learning in order to achieve optimal class separation.

translated by 谷歌翻译

Mixed Quantum-Classical Method For Fraud Detection with Quantum Feature Selection

Michele Grossi , Noelle Ibrahim , Voica Radescu , Robert Loredo , Kirsten Voigt , Constantin Von Altrock , Andreas Rudnik

分类：机器学习

2022-08-16

本文使用Qiskit软件堆栈提出了金融支付行业中的量子支持矢量机（QSVM）算法的第一个端到端应用，用于金融支付行业中的分类问题。基于实际卡支付数据，进行了详尽的比较，以评估当前最新的量子机学习算法对经典方法带来的互补影响。使用量子支持矢量机的特征映射特征来探索一种搜索最佳功能的新方法。使用欺诈特定的关键绩效指标比较结果：基于人类专业知识（规则决策），经典的机器学习算法（随机森林，XGBoost）和基于量子的机器学习算法，从分析中提取了准确性，回忆和假阳性率。。此外，通过使用结合经典和量子算法的合奏模型来更好地改善预防欺诈的决策，从而探索了混合经典量子方法。我们发现，正如预期的那样，结果高度依赖于用于选择它们的特征选择和算法。 QSVM对特征空间进行了互补的探索，从而在大幅度降低的数据集上拟合了量子硬件的当前状态，从而提高了混合量子古典方法的欺诈检测准确性。

translated by 谷歌翻译

QuASK -- Quantum Advantage Seeker with Kernels

Francesco Di Marcantonio , Massimiliano Incudini , Davide Tezza , Michele Grossi

分类：机器学习

2022-06-30

Quask是用Python编写的量子机学习软件，可支持研究人员设计，实验和评估不同的量子和经典核性能。该软件是无关紧要的软件包，可以与所有主要量子软件包（例如IBM Qiskit，Xanadu的Pennylane，Amazon Braket）集成。问题通过简单的预处理数据，定义和计算量子和经典内核，无论是自定义或预定义的，都可以指导用户。通过此评估，包装提供了有关潜在的量子优势和对概括误差的预测界限的评估。此外，它允许生成参数量子内核，可以使用基于梯度的优化，网格搜索或遗传算法训练。还计算了预测的量子内核，这是一种有效的解决方案，可减轻大型希尔伯特空间的指数缩放维度引起的维数的诅咒。问题还可以生成量子模型的可观察值，并使用它们来研究量子和经典核的预测能力。

translated by 谷歌翻译

Conditional Born machine for Monte Carlo event generation

Oriel Kiss , Michele Grossi , Enrique Kajomovitz , Sofia Vallecorsa

分类：机器学习

2022-05-16

生成建模是近期量子设备的一项有前途的任务，可以将量子测量的随机性作为随机来源。所谓的出生机器是纯粹的量子模型，并承诺以量子的方式生成概率分布，而对经典计算机无法访问。本文介绍了出生的机器在蒙特卡洛模拟中的应用，并将其覆盖范围扩展到多元和有条件的分布。模型在（嘈杂）模拟器和IBM量子超导量子硬件上运行。更具体地说，出生的机器用于生成由Muons和探测器材料之间的散射过程和高能量物理颜料实验中的探测器材料产生的事件。 MFC是出现在标准模型理论框架中的玻色子，它们是暗物质的候选者。经验证据表明，诞生的机器可以从蒙特卡洛模拟中重现数据集的边际分布和相关性。

translated by 谷歌翻译

An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation

Kevin Moran , Ali Yachnes , George Purnell , Junayed Mahmud , Michele Tufano , Carlos Bernal-Cárdenas , Denys Poshyvanyk , Zach H'Doubler

分类：人工智能 | 计算机视觉 | 机器学习

2023-01-03

Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.

translated by 谷歌翻译

A Segmentation Method for fluorescence images without a machine learning approach

Giuseppe Giacopelli , Michele Migliore , Domenico Tegolo

分类：计算机视觉 | 人工智能

2022-12-28

Background: Image analysis applications in digital pathology include various methods for segmenting regions of interest. Their identification is one of the most complex steps, and therefore of great interest for the study of robust methods that do not necessarily rely on a machine learning (ML) approach. Method: A fully automatic and optimized segmentation process for different datasets is a prerequisite for classifying and diagnosing Indirect ImmunoFluorescence (IIF) raw data. This study describes a deterministic computational neuroscience approach for identifying cells and nuclei. It is far from the conventional neural network approach, but it is equivalent to their quantitative and qualitative performance, and it is also solid to adversative noise. The method is robust, based on formally correct functions, and does not suffer from tuning on specific data sets. Results: This work demonstrates the robustness of the method against the variability of parameters, such as image size, mode, and signal-to-noise ratio. We validated the method on two datasets (Neuroblastoma and NucleusSegData) using images annotated by independent medical doctors. Conclusions: The definition of deterministic and formally correct methods, from a functional to a structural point of view, guarantees the achievement of optimized and functionally correct results. The excellent performance of our deterministic method (NeuronalAlg) to segment cells and nuclei from fluorescence images was measured with quantitative indicators and compared with those achieved by three published ML approaches.

translated by 谷歌翻译

Anomaly detection in laser-guided vehicles' batteries: a case study

Gianfranco Lombardo , Stefano Cagnoni , Stefano Cavalli , Juan José Contreras Gonzáles , Francesco Monica , Monica Mordonini , Michele Tomaiuolo

分类：机器学习

2022-12-27

Detecting anomalous data within time series is a very relevant task in pattern recognition and machine learning, with many possible applications that range from disease prevention in medicine, e.g., detecting early alterations of the health status before it can clearly be defined as "illness" up to monitoring industrial plants. Regarding this latter application, detecting anomalies in an industrial plant's status firstly prevents serious damages that would require a long interruption of the production process. Secondly, it permits optimal scheduling of maintenance interventions by limiting them to urgent situations. At the same time, they typically follow a fixed prudential schedule according to which components are substituted well before the end of their expected lifetime. This paper describes a case study regarding the monitoring of the status of Laser-guided Vehicles (LGVs) batteries, on which we worked as our contribution to project SUPER (Supercomputing Unified Platform, Emilia Romagna) aimed at establishing and demonstrating a regional High-Performance Computing platform that is going to represent the main Italian supercomputing environment for both computing power and data volume.

translated by 谷歌翻译

2-hop Neighbor Class Similarity (2NCS): A graph structural metric indicative of graph neural network performance

Andrea Cavallo , Claas Grohnfeldt , Michele Russo , Giulio Lovisotto , Luca Vassio

分类：机器学习

2022-12-26

Graph Neural Networks (GNNs) achieve state-of-the-art performance on graph-structured data across numerous domains. Their underlying ability to represent nodes as summaries of their vicinities has proven effective for homophilous graphs in particular, in which same-type nodes tend to connect. On heterophilous graphs, in which different-type nodes are likely connected, GNNs perform less consistently, as neighborhood information might be less representative or even misleading. On the other hand, GNN performance is not inferior on all heterophilous graphs, and there is a lack of understanding of what other graph properties affect GNN performance. In this work, we highlight the limitations of the widely used homophily ratio and the recent Cross-Class Neighborhood Similarity (CCNS) metric in estimating GNN performance. To overcome these limitations, we introduce 2-hop Neighbor Class Similarity (2NCS), a new quantitative graph structural property that correlates with GNN performance more strongly and consistently than alternative metrics. 2NCS considers two-hop neighborhoods as a theoretically derived consequence of the two-step label propagation process governing GCN's training-inference process. Experiments on one synthetic and eight real-world graph datasets confirm consistent improvements over existing metrics in estimating the accuracy of GCN- and GAT-based architectures on the node classification task.

translated by 谷歌翻译

Semi-supervised GAN for Bladder Tissue Classification in Multi-Domain Endoscopic Images

Jorge F. Lazo , Benoit Rosa , Michele Catellani , Matteo Fontana , Francesco A. Mistretta , Gennaro Musi , Ottavio de Cobelli , Michel de Mathelin , Elena De Momi

分类：计算机视觉 | 机器学习

2022-12-21

Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to identify and classify cancerous lesions. Computer vision methods that use both imaging techniques could improve endoscopic diagnosis. We address the challenge of tissue classification when annotations are available only in one domain, in our case WLI, and the endoscopic images correspond to an unpaired dataset, i.e. there is no exact equivalent for every image in both NBI and WLI domains. Method: We propose a semi-surprised Generative Adversarial Network (GAN)-based method composed of three main components: a teacher network trained on the labeled WLI data; a cycle-consistency GAN to perform unpaired image-to-image translation, and a multi-input student network. To ensure the quality of the synthetic images generated by the proposed GAN we perform a detailed quantitative, and qualitative analysis with the help of specialists. Conclusion: The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively, while the same metrics obtained in the unlabeled domain (NBI) are 0.92, 0.64, and 0.94 respectively. The quality of the generated images is reliable enough to deceive specialists. Significance: This study shows the potential of using semi-supervised GAN-based classification to improve bladder tissue classification when annotations are limited in multi-domain data.

translated by 谷歌翻译

Natural Language to Code Generation in Interactive Data Science Notebooks

Pengcheng Yin , Wen-Ding Li , Kefan Xiao , Abhishek Rao , Yeming Wen , Kensen Shi , Joshua Howland , Paige Bailey , Michele Catasta , Henryk Michalewski

分类：自然语言处理

2022-12-19

Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using the pandas data analysis framework in data science notebooks. ARCADE features multiple rounds of NL-to-code problems from the same notebook. It requires a model to understand rich multi-modal contexts, such as existing notebook cells and their execution states as well as previous turns of interaction. To establish a strong baseline on this challenging task, we develop PaChiNCo, a 62B code language model (LM) for Python computational notebooks, which significantly outperforms public code LMs. Finally, we explore few-shot prompting strategies to elicit better code with step-by-step decomposition and NL explanation, showing the potential to improve the diversity and explainability of model predictions.

translated by 谷歌翻译